Virtual register file

ABSTRACT

The present disclosure is related to a virtual register file. Source code can be compiled to include references to a virtual register file for data subject to a logical operation. The references can be dereferenced at runtime to obtain physical addresses of memory device elements according to the virtual register file. The logical operation can be performed in the memory device on data stored in the memory device elements.

PRIORITY INFORMATION

This application is a Continuation of U.S. application Ser. No.16/054,702, filed Aug. 3, 2018, which issues as U.S. Pat. No. 10,963,398on Mar. 30, 2021, which is a Continuation of U.S. application Ser. No.15/085,631, filed Mar. 30, 2016, which issued as U.S. Pat. No.10,049,054 on Aug. 14, 2018, which claims the benefit of U.S.Provisional Application No. 62/141,601, filed Apr. 1, 2015, the contentsof which are included herein by reference.

TECHNICAL FIELD

The present disclosure relates generally to semiconductor memory andmethods, and more particularly, to a virtual register file.

BACKGROUND

Memory devices are typically provided as internal, semiconductor,integrated circuits in computing devices or other electronic devices.There are many different types of memory including volatile andnon-volatile memory. Volatile memory can require power to maintain itsdata (e.g., user data, error data, etc.) and includes random-accessmemory (RAM), dynamic random access memory (DRAM), and synchronousdynamic random access memory (SDRAM), among others. Non-volatile memorycan provide persistent data by retaining stored data when not poweredand can include NAND flash memory, NOR flash memory, read only memory(ROM), Electrically Erasable Programmable ROM (EEPROM), ErasableProgrammable ROM (EPROM), and resistance variable memory such as phasechange random access memory (PCRAM), resistive random access memory(RRAM), and magnetoresistive random access memory (MRAM), such as spintorque transfer random access memory (STT RAM), among others.

Computing systems often include a number of processing resources (e.g.,one or more processors), which may retrieve and execute instructions andstore the results of the executed instructions to a suitable location. Aprocessor can comprise a number of functional units (e.g., hereinreferred to as functional unit circuitry (FUC)) such as arithmetic logicunit (ALU) circuitry, floating point unit (FPU) circuitry, and/or acombinatorial logic block, for example, which can execute instructionsto perform logical operations such as AND, OR, NOT, NAND, NOR, and XORlogical operations on data (e.g., one or more operands).

A number of components in a computing system may be involved inproviding instructions to the functional unit circuitry for execution.The instructions may be generated, for instance, by a processingresource such as a controller and/or host processor. Data (e.g., theoperands on which the instructions will be executed to perform thelogical operations) may be stored in a memory array that is accessibleby the FUC. The instructions and/or data may be retrieved from thememory array and sequenced and/or buffered before the FUC begins toexecute instructions on the data. Furthermore, as different types ofoperations may be executed in one or multiple clock cycles through theFUC, intermediate results of the operations and/or data may also besequenced and/or buffered. In many instances, the processing resources(e.g., processor and/or associated FUC) may be external to the memoryarray, and data can be accessed (e.g., via a bus between the processingresources and the memory array) to execute instructions. Data can bemoved from the memory array to registers external to the memory arrayvia a bus.

A register file is an array of processor registers in a centralprocessing unit (CPU). Integrated circuit-based register files may beimplemented, for example, by static random access memory (SRAM). Theinstruction set architecture of a CPU may define a set of registers usedto stage data between memory and the FUC. The register file may bevisible to the programmer, as opposed to the cache, which may not bevisible to the programmer.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an apparatus in the form of a computingsystem including at least one memory system in accordance with a numberof embodiments of the present disclosure.

FIG. 2 is a schematic diagram of a portion of a memory device inaccordance with a number of embodiments of the present disclosure.

FIG. 3 is a block diagram illustrating virtual register file memorytranslation according to a number of embodiments of the presentdisclosure.

DETAILED DESCRIPTION

Given the complexity of modern computer architectures, softwareprogrammers and software tool chains (e.g., compilers, debuggers, etc.)can have more difficulty extracting efficient performance from a targetplatform. Adding processing in memory (PIM) devices can furthercomplicate the architecture. Most modern computer architectures use aregister-memory technique, where operations are executed in two separatedomains. Logical operations (e.g., arithmetic, flow control, andcombinatorial operations) are generally executed on a number of registerfiles. Memory operations (e.g., load, store, etc.) are generallyexecuted on memory devices. Instructions in register-memoryarchitectures utilize register indices or memory addresses to indicatehow/where to perform an operation.

PIM computing architectures and/or devices can be classified asmemory-memory devices in computing architecture taxonomies. This impliesthat both logical operations and memory operations are performed on thememory devices in-situ. Instructions in memory-memory architectures usephysical addresses to indicate how/where to perform an operation.

Modern applications and operating systems use the notions of relocationand virtual addressing, which imply that an application can be loaded orrelocated into different physical memory spaces due to the fact that theactual addressing is virtualized. The application and operating systemreside in the virtual address space. The hardware and systemarchitecture dereference these virtual addresses to their physicaladdresses when memory requests are made. However, any systemarchitecture that includes the use of a PIM device that natively relieson physical addressing conflicts with the notion of virtualizing memory.

Some embodiments of the present disclosure can expose low-level memoryfunctionality provided by a PIM device using a register-memory layer.This layer is referred to herein as a virtual register file. The abilityto provide register-memory access to a PIM device significantlydecreases the level of customization that would otherwise be used inapplications for the PIM device via an optimizing compiler. Otherwise,the applications would be customized for each different kind of PIMdevice or devices included in a system. Providing optimizing compilersand runtime systems with the ability to target what appears to be aregister-memory architecture can greatly simplify an implementation thatuses a PIM device. Abstracting the physical addressing mechanisms of aPIM device such that virtual addressing may sufficiently exist withoutcollisions can be beneficial for integrating PIM devices with systemsthat generally operate on a register-memory architecture.

The present disclosure is related to a virtual register file. Sourcecode can be compiled to include references to a virtual register filefor data subject to a logical operation. The references can bedereferenced at runtime to obtain physical addresses of memory deviceelements according to the virtual register file. The logical operationcan be performed in the memory device on data stored in the memorydevice elements.

In the following detailed description of the present disclosure,reference is made to the accompanying drawings that form a part hereof,and in which is shown by way of illustration how a number of embodimentsof the disclosure may be practiced. These embodiments are described insufficient detail to enable those of ordinary skill in the art topractice the embodiments of this disclosure, and it is to be understoodthat other embodiments may be utilized and that process, electrical,and/or structural changes may be made without departing from the scopeof the present disclosure. As used herein, the designators “M” and “N”,particularly with respect to reference numerals in the drawings,indicates that a number of the particular feature so designated can beincluded. As used herein, “a number of” a particular thing can refer toone or more of such things (e.g., a number of memory devices can referto one or more memory devices). As used herein, the terms “first” and“second” are used to differentiate between one feature from another anddo not necessarily imply an order between the features so designated.

The figures herein follow a numbering convention in which the firstdigit or digits correspond to the drawing figure number and theremaining digits identify an element or component in the drawing.Similar elements or components between different figures may beidentified by the use of similar digits. For example, 110 may referenceelement “10” in FIG. 1, and a similar element may be referenced as 210in FIG. 2. Multiple analogous elements within one figure may bereferenced with a reference numeral followed by a hyphen and anothernumeral or a letter. For example, 240-1 may reference element 20-1 inFIGS. 2 and 240-N may reference element 40-N, which can be analogous toelement 240-1. Such analogous elements may be generally referencedwithout the hyphen and extra numeral or letter. For example, elements240-1, . . . , 240-N may be generally referenced as 440. As will beappreciated, elements shown in the various embodiments herein can beadded, exchanged, and/or eliminated so as to provide a number ofadditional embodiments of the present disclosure. In addition, as willbe appreciated, the proportion and the relative scale of the elementsprovided in the figures are intended to illustrate certain embodimentsof the present invention, and should not be taken in a limiting sense.

FIG. 1 is a block diagram of an apparatus in the form of a computingsystem 100 including at least one memory system 104 in accordance with anumber of embodiments of the present disclosure. As used herein, a host102, a memory system 104, a memory device 110, a memory array 111,and/or sensing circuitry 124 might also be separately considered an“apparatus.”

The computing system 100 can include a host 102 coupled to memory system104, which includes a memory device 110 (e.g., including a memory array111 and/or sensing circuitry 124). The host 102 can be a host systemsuch as a personal laptop computer, a desktop computer, a digitalcamera, a mobile telephone, or a memory card reader, among various othertypes of hosts. The host 102 can include a system motherboard and/orbackplane and can include a number of processing resources (e.g., one ormore processors, microprocessors, or some other type of controllingcircuitry), such as central processing unit (CPU) 106. The CPU 106 canbe coupled to secondary storage 114 and to main memory 112 via a memorybus 116. The secondary storage 114 can be a storage device or othermedia not directly accessible by the CPU 106 such as hard disk drives,solid state drives, optical disc drives, and can be non-volatile memory.The main memory 112 is directly accessible by the CPU 106. The mainmemory 112 can be volatile memory such as DRAM. The memory bus 116 canbe analogous to the control bus 136 and the I/O bus 138, but forcommunication between the CPU 106 and the main memory 112 instead of forcommunication between the host 102 and the memory system 104. The CPU106 can include a logic unit 118 coupled to a number of registers 120and cache 122. The cache 122 can be an intermediate stage between therelatively faster registers 120 and the relatively slower main memory112. Data to be operated on by the CPU 106 may be copied to cache 122before being placed in a register 120, where the operations can beeffected by the logic unit 118. Although not specifically illustrated,the cache 122 can be a multilevel hierarchical cache.

The computing system 100 can include separate integrated circuits orboth the host 102 and the memory system 104 can be on the sameintegrated circuit. The computing system 100 can be, for instance, aserver system and/or a high performance computing system and/or aportion thereof. Although the example shown in FIG. 1 illustrates asystem having a Von Neumann architecture, embodiments of the presentdisclosure can be implemented in non-Von Neumann architectures (e.g., aTuring machine), which may not include one or more components (e.g.,CPU, ALU, etc.) often associated with a Von Neumann architecture.

For clarity, the computing system 100 has been simplified to focus onfeatures with particular relevance to the present disclosure. The memoryarray 111 can be a hybrid memory cube (HMC), processing in memory randomaccess memory (PIMRAM) array, DRAM array, SRAM array, STT RAM array,PCRAM array, TRAM array, RRAM array, NAND flash array, and/or NOR flasharray, for instance. The memory array 111 can comprise memory cellsarranged in rows coupled by access lines (which may be referred toherein as word lines or select lines) and columns coupled by sense lines(which may be referred to herein as digit lines or data lines). Althougha single memory device 110 is shown in FIG. 1, embodiments are not solimited. For instance, memory system 104 may include a number of memorydevices 110 (e.g., a number of banks of DRAM cells).

The memory system 104 can include address circuitry 126 to latch addresssignals provided over an I/O bus 138 (e.g., a data bus) through I/Ocircuitry 130. Address signals can be received and decoded by a rowdecoder 128 and a column decoder 134 to access the memory device 110.Data can be read from the memory array 111 by sensing voltage and/orcurrent changes on the sense lines using sensing circuitry 124. Thesensing circuitry 124 can read and latch a page (e.g., row) of data fromthe memory array 111. The I/O circuitry 130 can be used forbi-directional data communication with host 102 over the I/O bus 138.The write circuitry 132 can be used to write data to the memory device110.

Controller 108 can decode signals provided by control bus 136 from thehost 102. These signals can include chip enable signals, write enablesignals, and address latch signals that are used to control memoryoperations performed on the memory device 110, including data read, datawrite, and data erase operations. The signals can also be used tocontrol logical operations performed on the memory device 110 includingarithmetic, flow control, and combinatorial operations, among others. Invarious embodiments, the controller 108 is responsible for executinginstructions from the host 102. The controller 108 can be a statemachine, a sequencer, a processor, and/or other control circuitry.

An example of the sensing circuitry 124 is described further below inassociation with FIG. 2. For instance, in a number of embodiments, thesensing circuitry 124 can comprise a number of sense amplifiers and anumber of compute components, which may comprise a latch serving as anaccumulator and that can be used to perform logical operations (e.g., ondata associated with complementary sense lines). Logical operations caninclude Boolean operations (e.g., AND, OR, NOR, XOR, etc.), combinationsof Boolean operations to perform other mathematical operations, as wellas non-Boolean operations. In a number of embodiments, the sensingcircuitry 124 can be used to perform logical operations using datastored in the memory array 111 as inputs and store the results of thelogical operations back to the memory array 111 without transferring viaa sense line address access (e.g., without firing a column decodesignal). As such, a logical operation can be performed using sensingcircuitry 124 rather than and/or in addition to being performed byprocessing resources external to the sensing circuitry 124 (e.g., by thehost CPU 106 and/or other processing circuitry, such as ALU circuitry,located on the memory system 104, such as on the controller 108, orelsewhere).

In various previous approaches, data associated with a logicaloperation, for instance, would be read from memory via sensing circuitryand provided to registers 120 associated with the host CPU 106. A logicunit 118 of the host CPU 106 would perform the logical operations usingthe data (which may be referred to as operands or inputs) from thememory array 111 in the registers 120 and the result could betransferred back to the memory array 111 via the local I/O lines. Incontrast, in a number of embodiments of the present disclosure, sensingcircuitry 124 can be configured to perform a logical operation on datastored in memory cells in memory array 111 and store the result back tothe array 111 without enabling a local I/O line coupled to the sensingcircuitry and without using registers 120 of the host CPU 106.

As such, in a number of embodiments, registers 126 and/or a logic unit118 of a host CPU 106 external to the memory array 111 and sensingcircuitry 124 may not be needed to perform the logical operation as thesensing circuitry 124 can be operated to perform the logical operationusing the address space of memory array 111. Additionally, the logicaloperation can be performed without the use of an external processingresource.

The host 102 can be configured with an operating system. The host 102can be coupled to the memory device 110 (e.g., via the control bus 136and/or the I/O bus 138). The operating system is executable instructions(software) that manages hardware resources and provides services otherexecutable instructions (applications) that run on the operating system.The operating system can implement a virtual memory system.

According to the present disclosure, the CPU 106 can executeinstructions to define a buffer in the main memory 112 of the host 102with sufficient space to contain backing storage for a virtual registerfile (VRF) 117. The instructions can be executed to logically split thebuffer into a number of virtual vector registers (VVR) 119, a numbervirtual scalar registers (VSR) 121, and a number of virtual controlregisters (VCR) 123, among other virtualized components as describedherein, which collectively define the virtual register file 117. Thehost 102 can create the virtual register file 117 at runtime. Thesevirtual registers 119, 121, 123 can represent a number of the registers120 of the CPU 106 (e.g., physical vector registers, physical scalarregisters, and/or physical control registers) for logical operations tobe performed in the memory device 110. The virtual registers canrepresent physical registers 120 of the CPU 106 with respective indicesto the virtual register file 117 at compile time as described in moredetail in association with FIG. 3. The virtual register file 117,specifically, the virtual vector registers 119 can store virtualaddresses (e.g., base virtual addresses) of elements of the memorydevice 110 (e.g., the PIM memory device 110 illustrated in FIG. 1). Amemory element (also referred to as a computational element) stores anamount of data that is operated on in one logical operation. The memoryelement can refer to a number of memory cells that store the amount ofdata. Memory-memory architectures may be prohibitively difficult toexpose to high performance software and compiler implementations.

In some embodiments, the host 102 can include a memory management unit(MMU) 115. The MMU 115 is a hardware component that can performtranslation between virtual memory addresses and physical memoryaddresses. That is, the MMU 115 can translate the virtual memoryaddresses stored in the virtual register file 117 to physical addressesof the elements of the memory device 110. Thus, the virtual registerfile 117 does not store the physical addresses of the elements of thememory device 110. In this regard, the virtual register file 117 doesnot need to be updated when data is moved within the memory device 110.Furthermore, unlike physical registers 120 associated with the CPU 106for operations on data stored in the main memory 112, the virtualregisters in the virtual register file 117 do not receive or store datacorresponding to the elements of the memory device 110.

FIG. 2 is a schematic diagram of a portion of a memory device 210 inaccordance with a number of embodiments of the present disclosure. Thememory device 210 is analogous to the memory device 110 illustrated inFIG. 1. The memory device 210 can include a memory array 211 thatincludes memory cells 240-1, 240-2, 240-3, 240-4, 240-5, 240-6, 240-7,240-8, . . . , 240-N coupled to rows of access lines 242-1, 242-2,242-3, 242-4, 242-5, 242-6, 242-7, . . . , 242-M and columns of senselines 244-1, 244-2, 244-3, 244-4, 244-5, 244-6, 244-7, 244-8, . . . ,244-N. The memory array 211 is not limited to a particular number ofaccess lines and/or sense lines, and use of the terms “rows” and“columns” does not intend a particular physical structure and/ororientation of the access lines and/or sense lines. Although notpictured, each column of memory cells can be associated with acorresponding pair of complementary sense lines.

Each column of memory cells can be coupled to sensing circuitry 224,which can be analogous to sensing circuitry 124 illustrated in FIG. 1.In this example, the sensing circuitry includes a number of senseamplifiers 246-1, 246-2, 246-3, 246-4, 246-5, 246-6, 246-7, 246-8, . . ., 246-N coupled to the respective sense lines 244. The sense amplifiers246 are coupled to input/output (I/O) line 254 (e.g., a local I/O line)via access devices (e.g., transistors) 250-1, 250-2, 250-3, 250-4,250-5, 250-6, 250-7, 250-8, . . . , 250-N. In this example, the sensingcircuitry also includes a number of compute components 248-1, 248-2,248-3, 248-4, 248-5, 248-6, 248-7, 248-8, . . . , 248-N coupled to therespective sense lines 244. Column decode lines 252-1, 252-2, 252-3,252-4, 252-5, 252-6, 252-7, 252-8, . . . , 252-N are coupled to thegates of access devices 250 respectively, and can be selectivelyactivated to transfer data sensed by respective sense amps 246 and/orstored in respective compute components 248 to a secondary senseamplifier 256. In a number of embodiments, the compute components 248can be formed on pitch with the memory cells of their correspondingcolumns and/or with the corresponding sense amplifiers 246.

In a number of embodiments, the sensing circuitry (e.g., computecomponents 248 and sense amplifiers 246) is configured to perform anumber of logical operations on elements stored in array 211. As anexample, a first plurality of elements can be stored in a first group ofmemory cells coupled to a particular access line (e.g., access line242-1) and to a number of sense lines 244, and a second plurality ofelements can be stored in a second group of memory cells coupled to adifferent access line (e.g., access line 242-2) and the respectivenumber of sense lines 244. Each element of the first plurality ofelements can have a logical operation performed thereon with arespective one of the second plurality of elements, and the result ofthe logical operation can be stored (e.g., as a bit-vector) in a thirdgroup of memory cells coupled to a particular access line (e.g., accessline 242-3) and to the number of sense lines 244.

FIG. 3 is a block diagram illustrating virtual register file memorytranslation according to a number of embodiments of the presentdisclosure. The host (e.g., host 102 illustrated in FIG. 1) can compilesource code 360, such as C, C++, etc., with a compiler 362. The sourcecode 360 can be compiled to include references to a virtual registerfile 317 for data subject to a logical operation. The compiler 362 canbe configured to emit compiled code 364 (e.g., machine code, objectcode, etc.) with references to the virtual register file 317 in terms ofindices to virtual vector registers in the virtual register file 317.The source code 360 can be compiled to target the virtual register file317 for logical operations to be performed in the memory device 310 asthough the logical operations were to be performed in the virtualregister file 317 on the host (e.g., host 102 illustrated in FIG. 1).That is, the virtual register file 317 can be targeted as though it wasa physical register (e.g., physical register 120 illustrated in FIG. 1)for the host device even though the actual logical operation is to beperformed in the memory device 310 on the memory system (e.g., memorysystem 104 illustrated in FIG. 1).

In order to provide functionality for programmers and compilers togenerate code for the memory device 310, the virtual register file 317is defined as the basis for the mapping between virtualized memorydevices and the actual processing and memory hardware. Some previoussource code compiler technology was made aware of or targeted a fixedset of mutable (changeable) hardware elements for the purpose of safecode generation. This hardware would generally include computationalelements (e.g., logic unit 118 illustrated in FIG. 1) and register files(e.g., registers 120 illustrated in FIG. 1). However, the memory device310 does not have mutable register files. Operations are to be performedat or near the physical row and column intersection of the memory array(e.g., memory array 111 illustrated in FIG. 1) or in the in-situ storagein the memory device (e.g., memory device 110 illustrated in FIG. 1). Ifthe memory device 310 were to function according to some previouscompiler approaches, each memory cell in the memory device 310 wouldbecome a mutable and allocable hardware element, which could preventapplications from executing alongside one another in virtual memory.

The virtual register file 317 can virtualize access to the memory cellsof the memory device 310. The virtual register file 317 can be stored ina buffer in memory (e.g., main memory 112 of a host 102 illustrated inFIG. 1) and can function similar to a register file, except that theactual data to be operated on is not transferred to the virtual registerfile 317, or even to the host, because the operation is to be performedin the memory device 310 at runtime, rather than performing the logicaloperation with the host. The virtual register file 317 can be configuredto store addresses of respective target elements rather than the actualdata. There is no motivation to virtualize a physical register usedaccording to some previous approaches because the physical registerreceives the data to be operated on by a logic unit of the host.

The entries in the virtual register file 317 can be translated atruntime from virtual address contents to respective physical addressessuch that steps can be taken to initiate a logical operation using thememory device 310. Each element of the buffer that provides the backingstorage for the virtual register file can be of a particular size (e.g.,64 bits) such that the buffer may be indexed analogously to a registerfile according to some previous approaches. The compiler 362 can beconfigured to target the virtual register file 317 as opposed tophysical hardware entities (e.g., registers 120 illustrated in FIG. 1)for operations to be performed in the memory device 310.

The compiled code 364 can include logical operations 368 and references366 to the virtual register file 317, however the compiled code will notinclude a physical address corresponding to the virtual addressassociated with the virtual vector register. Some examples of thelogical operations include add, subtract, multiply, etc. A particularreference 366 can be an index (e.g., % v0, % v1, . . . , etc.) to thevirtual register file 317. A particular index can point to a virtualregister in the virtual register file 317, such as a virtual vectorregister, a virtual scalar register, and/or a virtual control register,among others.

A respective index to the virtual register file 317 can represent avector register or a scalar register with a fixed amount of backingstorage of the virtual register file 317. A respective index to thevirtual register file 317 can represent a control register with avariable amount of the backing storage. Virtual addresses for elementsof the memory device 310 can be stored in virtual vector registers ofthe virtual register file 317.

At runtime, the references to the virtual register file can bedereferenced (e.g., by virtual to physical translation 374) to obtainphysical address of memory device elements. The dereferencing (e.g.,virtual to physical translation 374) can include use of the runtimeenvironment of the memory device 310 and/or use of a memory managementunit of the host (e.g., the MMU 115 of the host 102 illustrated in FIG.1). The logical operation can then be performed in the memory device 310on data that was stored in the memory device elements.

In some embodiments, the virtual register file 317 can also be used formemory operations (in addition to logical operations). For example, thesource code 360 can be compiled to include a reference to the virtualregister file 317 for data subject to a memory operation. The referenceto the virtual register file can be dereferenced at runtime to obtain aphysical address of a memory device element according to the virtualregister file 317.

The following table illustrates an example of a virtual register filestructure including indices, mnemonics, and descriptions of a number ofcomponents of the virtual register file:

TABLE 1 Index Mnemonic Description 0x00-0x0F T0-T15 Temporary RowRegisters 0x14-0x17 B0-B3 Bank Registers 0x1A CB Bank Control Register0x1B ACC0 Accumulator 0x20-0x3F V0-V31 Vector Registers 0x40-0x5F S0-S31Scalar Registers 0x60 VL Vector Length (in elements) 0x61 VS VectorStride (in bits) 0x62 VF Vector First 0x63 AB Arbitrary Bit IntegerWidth (in bits) 0x64 FRAC Arbitrary Float/Fixed Fractional Width (inbits) 0x65 EXP Arbitrary Float/Fixed Exponent Width (in bits) 0x66 SPStack Pointer 0x67 FP Frame Pointer 0x68 RP Return Pointer 0x69 APArgument Pointer 0x6A EMASK Exception Mask 0x6B DMASK Device Mask 0x6CBMASK Bank Mask 0x70 MAXROW Number of rows in subarray 0x71 MAXCOLNumber of columns in subarray 0x72 MAXSA Number of subarrays in bank0x73 MAXBANK Number of banks in device 0x74 MAXDEV Number of devices insystem 0x75 MAXTMP Number of temporary rows in subarray 0x76 VLMAX.HMaximum horizontal vector length (in bits) 0x77 VMLAX.V Maximum verticalvector length (in bits) 0x78 CONFIG Config register

Although not specifically illustrated in the table above, the virtualregister file 317 can store virtual addresses corresponding to physicaladdresses of elements of the memory device 310. In some embodiments, thevirtual memory address can be a base virtual memory address, which incombination with a stored stride of memory device elements and a lengthof memory device elements together with the translated base virtualaddress, define which memory device elements correspond to the virtualvector register that stores the virtual memory address. However, thevirtual register file 317 does not store physical addresses of theelements of the memory device 310.

Virtual to physical translation 374 can occur at runtime according tothe runtime library 372, which can be part of the runtime environment ofthe host and/or the memory device 310. The translation can occur inresponse to a command calling for a logical operation to be performed inthe memory device 310. For example, the compiled source code 364 can beexecuted to cause the logical operation to be performed in the memorydevice. The logical operation can be initialized according to a portionof the compiled source code 364 that addresses a particular virtualaddress. The logical operation can be performed in the memory device 310on data stored in a particular physical address corresponding to theparticular virtual address.

The runtime library 372 can be loaded by an application running on thehost. For example, the runtime library 372 and the virtual register file317 can be loaded into main memory of the host. Although notspecifically illustrated as such, the runtime library 372 can createand/or contain the virtual register file 317 including virtual memoryaddresses of memory device elements. In some embodiments, each runningthread can be associated with one virtual register file 317. The virtualregister file 317 can be mapped to a particular memory bank of thememory device 310 at runtime, but can later be mapped to a differentbank. For example, the memory device 310 can include multiple banks,each including a number of subarrays. The application that is running onthe host can be relocated in the main memory of the host without editingthe virtual register file 317, while maintaining the functionalityprovided by the virtual register file 317.

Although not specifically illustrated as such, a non-transitorycomputing device readable medium for storing executable instructions caninclude all forms of volatile and non-volatile memory, including, by wayof example, semiconductor memory devices, DRAM, PIM, HMC, EPROM, EEPROM,flash memory devices, magnetic disks such as fixed, floppy, andremovable disks, other magnetic media including tape, optical media suchas compact discs (CDs), digital versatile discs (DVDs), and Blu-Raydiscs (BD). The instructions may be supplemented by, or incorporated in,ASICs. For example, any one or more of the secondary storage 114, theregisters 120, the cache 122, the main memory 112, and/or the memoryarray 111 illustrated in FIG. 1, can be a non-transitory computingdevice readable medium.

Although specific embodiments have been illustrated and describedherein, those of ordinary skill in the art will appreciate that anarrangement calculated to achieve the same results can be substitutedfor the specific embodiments shown. This disclosure is intended to coveradaptations or variations of one or more embodiments of the presentdisclosure. It is to be understood that the above description has beenmade in an illustrative fashion, and not a restrictive one. Combinationof the above embodiments, and other embodiments not specificallydescribed herein will be apparent to those of skill in the art uponreviewing the above description. The scope of the one or moreembodiments of the present disclosure includes other applications inwhich the above structures and methods are used. Therefore, the scope ofone or more embodiments of the present disclosure should be determinedwith reference to the appended claims, along with the full range ofequivalents to which such claims are entitled.

In the foregoing Detailed Description, some features are groupedtogether in a single embodiment for the purpose of streamlining thedisclosure. This method of disclosure is not to be interpreted asreflecting an intention that the disclosed embodiments of the presentdisclosure have to use more features than are expressly recited in eachclaim. Rather, as the following claims reflect, inventive subject matterlies in less than all features of a single disclosed embodiment. Thus,the following claims are hereby incorporated into the DetailedDescription, with each claim standing on its own as a separateembodiment.

What is claimed is:
 1. A method, comprising: compiling source code toinclude references to a virtual register file to target what appears tobe a register-memory architecture for data subject to a logicaloperation to be performed by a memory device on data stored in memorydevice elements as though the virtual register file was a physicalregister of a processor without using the physical register of theprocessor.
 2. The method of claim 1, further comprising not transferringthe data subject to the logical operation to the virtual register file.3. The method of claim 1, wherein compiling the source code comprisescompiling the source code at a host coupled to the memory device.
 4. Themethod of claim 1, wherein the references to the virtual register filecomprise indices to virtual vector registers in the virtual registerfile.
 5. The method of claim 3, wherein the method includes storingvirtual addresses in the virtual vector registers.
 6. The method ofclaim 1, wherein the method includes performing the logical operation bythe memory device at runtime.
 7. The method of claim 1, wherein themethod includes: compiling the source code to include a reference to thevirtual register file for data subject to a memory operation; anddereferencing the reference at runtime to obtain physical addresses of aparticular one of the memory device elements according to the virtualregister file.
 8. A method, comprising: performing a logical operationby a memory device on data stored in a particular physical addressaccording to a portion of compiled source code associated with a virtualregister file of a host device that addresses a particular virtualaddress, which targets the virtual register file as though it was aphysical register of a processor of the host device without using thephysical register of the processor of the host device, wherein theparticular virtual address corresponds to the particular physicaladdress.
 9. The method of claim 8, wherein the virtual addresses arereferenced by indices in the compiled source code.
 10. The method ofclaim 8, further comprising translating the particular virtual memoryaddress to the particular physical address.
 11. The method of claim 10,wherein translating the particular virtual address is performed in aruntime environment of the memory device.
 12. An apparatus, comprising:a memory device; and a host coupled to the memory device, wherein thehost includes a processor, a physical register, and main memory, andwherein the host is configured to: load, at runtime by an application, alibrary including a virtual register file into the main memory of thehost; and translate a particular virtual memory address, which targetsthe virtual register file as though it was the physical register, to aphysical address of the memory device for a logical operation to beperformed by the memory device without use of the physical register. 13.The apparatus of claim 12, wherein the host is further configured tocreate, at runtime, the virtual register file including virtual memoryaddresses of memory device elements.
 14. The apparatus of claim 13,wherein the host is further configured to relocate the application inthe main memory without editing the virtual register file.
 15. Theapparatus of claim 12, wherein the host includes a memory managementunit configured to translate the particular virtual memory address tothe physical address.
 16. The apparatus of claim 12, wherein a runtimeenvironment of the memory device is configured to translate theparticular virtual memory address to the physical address.
 17. Anon-transitory computer readable medium storing instructions executableby a processor to: load, at runtime by an application, a libraryincluding a virtual register file into the main memory of the host; andtranslate a particular virtual memory address, which targets the virtualregister file as though it was the physical register, to a physicaladdress of the memory device for a logical operation to be performed bythe memory device without use of the physical register.
 18. The mediumof claim 17, further including instructions to create, at runtime, thevirtual register file including virtual memory addresses of memorydevice elements.
 19. The medium of claim 18, further includinginstructions to relocate the application in the main memory withoutediting the virtual register file.
 20. The medium of claim 17, furtherincluding instructions to translate the particular virtual memoryaddress to the physical address via a memory management unit.